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Abstract: The growing volumes of XML data sources on the Web or produced by enterprises, 
organizations etc. raise many performance challenges for data management applications. In this 
work, we are concerned with the distributed, peer-to-peer management of large corpora of XML 
documents, based on distributed hash table (or DHT, in short) overlay networks. We present 
ViP2P (standing for Views in Peer-to- Peer), a distributed platform for sharing XML documents 
based on a structured P2P network infrastructure (DHT) . At the core of ViP2P stand distributed 
materialized XML views, defined by arbitrary XML queries, filled in with data published anywhere 
in the network, and exploited to efficiently answer queries issued by any network peer. ViP2P 
allows user queries to be evaluated over XML documents published by peers in two modes. First, a 
long-running subscription mode, when a query can be registered in the system and receive answers 
incrementally when and if published data matches the query. Second, queries can also be asked in 
an ad-hoc, snapshot mode, where results are required immediately and must be computed based 
on the results of other long-running, subscription queries. ViP2P innovates over other similar 
DHT-based XML sharing platforms by using a very expressive structured XML query language. 
This expressivity leads to a very flexible distribution of XML content in the ViP2P network, and 
to efficient snapshot query execution. ViP2P has been tested in real deployments of hundreds 
of computers. We present the platform architecture, its internal algorithms, and demonstrate 
its efficiency and scalability through a set of experiments. Our experimental results outgrow by 
orders of magnitude similar competitor systems in terms of data volumes, network size and data 
dissemination throughput. 

Key-words: P2P, XML, DHT, distributed query processing, view-based query evaluation 
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La plate-forme ViP2P: vues XML en pair-a-pair 



Resume : Les grands volumes de donnees XML disponibles sur le Web, pro- 
duites par les organisations ou individus posent des defis importants pour la 
gestion efficace de donnees. Ce travail est situe dans le contexte de la gestion 
de grands volumes de documents XML, dans un reseau decentralise, distribue, 
pair-a pair, qui s'appuye sur une table de hashage distribute (ou DHT). Dans ce 
rapport, nous presentons ViP2P (wies en pair-a-pair), une plateforme distribute 
pour le partage de documents XML s'appuyant sur un reseau de type DHT. Au 
cceur de ViP2P sont des vues materialisees distributes . Celles-ci sont definies 
par n'importe quel pair, sous la forme de requetes XML. Des que des donnees 
XML publiees par un pair quelconque correspondent aux definitions des vues, 
ces donnees seront utilisees pour contribuer au contenu des vues. ViP2P four- 
nit deux scenarios devaluation de requetes sur des documents XML. II existe 
d'abord un mode "souscription", ou une requete enregistreee dans le systeme 
regoit des reponses de fagon incrementale, lorsque des donnees que l'on vient 
de publier contribuent aux resultats. En deuxieme lieu, une requetes peut etre 
evaluee uniquement a partir des donnees deja publiees, en reecrivant la requete a 
l'aide des vues materialisees. Nous avons teste ViP2P deploye dans des reseaux 
distribues de plusieurs centaines d'ordinateurs. Dans ce rapport, nous presen- 
tons son architecture, ses principaux algorithmes, et demontrons son efficacitt et 
son passage a l'echelle par une serie d'experiences. Les resultats de nos mesures 
demontrent la robustesse de ViP2P jusqu'a des volumes de donnees, debit de 
dissemination de donnees, et failles de reseau, allant au dela (jusqu'a plusieurs 
ordres de grandeurs) des mesures precedemment publiees sur des systemes com- 
parables. 

Mots-cles : pair-a-pair, XML, THD, execution de requetes distribute, evalu- 
ation de requetes en terme de vues 
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1 Introduction 

The volumes of data sources available in the form of XML documents has ex- 
ploded since the W3C's 1998 standard, and so have the languages, tools and 
techniques for efficiently processing XML data. The interest of distribution in 
this context is twofold. First, a distributed storage and processing network can 
accommodate data volumes going far beyond the capacity of a single computer. 
Second, as organizations and individuals interact more and more, sharing and 
consuming one another's information flows, it is often the case that (XML) data 
sources are produced independently by several distributed sources. The set of 
producers and consumers of data related to a specific topic, e.g., IT journals, 
blogs and online bulletins, is not only distributed, but also dynamic: sources 
may join or leave the system, the set of information consumers or their topics 
of interest may also change in time etc. Thus, we are interested in the large- 
scale management of distributed XML data in a peer-to-peer (P2P) setting. To 
provide users with precise, detailed and complete answers to their requests for 
information, we adopt a database-style approach where such requests are for- 
mulated by means of a structured query language, and the system must return 
complete results. That is, if somewhere in the distributed peer network, an an- 
swer to a given query exists, the system will find it and include it in the query 
result. To achieve this, our goal is to build a P2P XML data management 
platform based on a distributed hash table (or DHT, in short [15 J. 

In this setting, users may formulate two kinds of information requests. First, 
they may want to subscribe to interesting data anywhere in the network, and 
published before or after the subscription is recorded in the system. Our goal 
is to persist the subscriptions and ensure that results are eventually returned 
as soon as possible following the publication of a matching data source. This 
is in the spirit, e.g., of RSS feeds, but extended to a distributed network where 
the source from which interesting data will come is not a priori known. Second, 
users may formulate ad-hoc (snapshot) queries, by which they just seek to obtain 
as fast as possible the results which have already been published in the network. 

The challenges raised by a DHT-based XML data management platform are: 

• building a distributed resource catalog, enabling client producers and con- 
sumers to "meet" in the virtual information sharing space; such a catalog 
is needed both for subscription and ad-hoc queries, 

• efficiently distributing the data of the network to the consumers that have 
subscribed to it and 

• providing efficient distributed query evaluation algorithms for answering 
ad-hoc queries fast. 

In this paper, we present ViP2P, standing for Views in Peer-to-Peer, a dis- 
tributed P2P platform for sharing Web data, and in particular XML data. 
ViP2P is built on top of a structured P2P network infrastructure, and it allows 
each peer in the network to share data with all the other peers. Data sharing in 
ViP2P is twofold. First, each network peer can ask long-running queries which 
are treated as subscriptions, that is, they receive results if and when a document 
published in the system matches such queries. Second, once results are stored 
for such a subscription, they are treated as materialized views based on which 
subsequent ad-hoc queries can be processed with snapshot semantics, i.e., based 
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only on the data already published in the network. Given such an ad-hoc query, 
a ViP2P peer looks up the ViP2P network for relevant materialized views, runs 
an algorithm for equivalently rewriting the query, identifies and evaluates a dis- 
tributed query evaluation plan which, based on the views, computes exactly the 
results of the query on the data published in the system prior to the query. 
ViP2P thus fills two kinds of needs: (i) disseminating information in a timely 
fashion to subscriber peers and (ii) re-using pre-computed results to process 
ad-hoc queries efficiently on the existing data only. 

A critical issue when deploying XML data management applications on a 
DHT is the division of tasks between the DHT and the upper layers. The DHT 
software running on each machine allows peers to remain logically connected 
to each other and to look up data based on search keys: a small set of simple, 
light-weight operations. In contrast, powerful XML data management requires 
complex languages (such as the W3C's XPath and XQuery standards), and 
scalable algorithms to cope with complex processing and large data transfers 
(known to raise performance issues in any distributed data management setting) . 

Experience with our previous DHT-based XML data management platform 
KadoP [3J has taught us to load the DHT layer as little as possible, and keep 
the heavy-weight query processing operations in the data management layer 
and outside the DHT. This has enabled us to build and efficiently deploy a 
system of important size (70.000 lines of Java code), which, as we show, scales 
on up to 250 computers in a WAN, and hundreds of GBs of XML data. ViP2P 
improves over the state of the art in DHT-based XML data management, since: 
(i) it is one of the very few systems actually implemented (together with [3l 133] , 
and opposed to prototypes built on DHT simulators), (ii) is shown to scale on 
data volumes that are orders of magnitude beyond the cited competitor systems 
and (Hi) has the most expressive XML query language, and the most advanced 
capabilities of re-using previously stored XML results, among all similar existing 
platforms EH E2 EU [25j OS] . 

ViP2P is part of a family of systems aiming at efficient management of 
XML data in structured peer-to-peer networks [H [U [TU1 [TBI [2H [331 [32] - The 
contributions of this work, with respect to the existing systems, are as follows: 

• We present a complete architecture for query evaluation, both in contin- 
uous (subscription) and in snapshot mode. This architecture enables the 
efficient dissemination of answers to tree pattern queries (expressed in an 
XQuery dialect) to peers which are interested in them, regardless of the 
relative order in time between the data and the subscription publication. 
As in [25j, it also allows to efficiently answer queries in snapshot mode, 
based on the content of the existing views materialized in the network, 
but using more expressive views, queries and rewritings. 

• We have fully implemented our architecture (about 250 classes and 70.000 
lines of Java code), on top of the FreePastry [T7] P2P infrastructure. We 
present a comprehensive set of experiments performed in a WAN, showing 
that (i) the performance of a fully deployed large-scale distributed system 
(and in particular a DHT-based XML management platform) is deter- 
mined by many parameters, beyond the network size and latency which 
can be set in typical P2P network simulators and (ii) the ViP2P archi- 
tecture scales to several hundreds of peers and hundreds of GBs of XML 
data, both unattained in previous works. 
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The paper is organized as follows. Section [2] surveys the state of the art 
m managing XML data in DHT networks. Section [3] introduces the ViP2P 
architecture via an example and describes its main modules. Section [4] presents 
the query and view language, as well as query rewriting in ViP2P, while Section|5] 
concentrates on the materialization, indexing and look-up of materialized views, 
at the core of the platform. In Section [6] we present a set of experiments 
analyzing the performance of ViP2P data management in a variety of settings 
and demonstrating its scalability, then we conclude. 
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2 State of the art 

In this section we present the current state of the art in XML data management 
over P2P networks. In Section 12.11 we focus on the differences of structured 
and unstructured P2P networks and the reasons behind our choice to use a 
structured P2P network for building our platform. In Section |2.2| we present 
our closest competitor works focusing on the management of XML data over 
structured DHT networks. Section |2.3| stresses the challenges of distributed 
XML data management in a real, deployed platform as opposed to simulations. 
Finally, in Section |2.4| we present earlier publications of the ViP2P platform. 

2.1 Structured vs. unstructured P2P networks 

Peer-to-peer content sharing platforms can be broadly classified in two groups. 
Unstructured peer-to-peer networks allow arbitrary connections among peers, 
that is, each peer may be connected to (or aware of the existence of) one or 
more network peers of its choice. Such network structure typically mimics some 
conceptual proximity between peers interested, for instance, in similar topics. 
Structured peer networks, on the other hand, impose the set of connections 
among peers. A survey of (structured and unstructured) P2P XML sharing 
platforms reflects the state of the art and open issues as of 2005 [24] and a more 
recent survey of XML document indexing and retrieval in P2P networks can be 
found in [2J. 

The different network structures impact the way in which searches (or queries) 
can be answered in the network. Thus, in unstructured networks, queries are 
forwarded from each peer to its set of known peers (or neighbors) and answers 
are computed gradually as the query reaches more and more peers. For instance, 
in |35j peers are logically organized into clusters that are formed on a document 
schema-similarity basis. The superpeers of the network are organized to form a 
tree, where each superpeer hosts schema information about its children. When 
a query arrives it is forwarded to the superpeers. Every superpeer performs 
location assignment: it examines the schemas of the documents of its children 
to detect which peers could possibly contribute results to the query. After the 
contributing peers have been located, the peer that originally posed the query 
builds a location aware algebraic plan and ships the corresponding subqueries 
to their respective peers. The results are then retrieved from each peer and 
the original query is evaluated by performing operations such as joins over the 
subquery results. 

It is easy to see that if query answers reside on a peer very far (in terms of 
peer connections) from the peer where the query originated, this may lead to 
numerous messages and a long query response time. To improve the precision, 
performance and recall of query answering in this context, many approaches 
have been proposed, from the earliest |30] to the very recent [TB], to name just 
a few. 

In contrast, structured networks (and their best-known representatives, dis- 
tributed hash tables or DHTs, in short [15]) provide a simple distributed index 
functionality implemented jointly by all the peers. The simplest DHT interface 
provides put(key, value) and get(key) operations allowing the storage of (key, 
value) pairs distributed over all the network peers. More advanced DHT struc- 
tures also allow range searches of the form get(key range), such as Baton |21ll2"2"] 
or P2PRing [13] 114"]. In a DHT, to answer a get request, a bounded number of 
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messages are exchanged in the network, typically in 0(log2{N)), where N is the 
number of network peers. 

In this work, we consider the setting of a structured network, based on a 
DHT, and design an efficient platform for XML query processing in large scale 
networks, based on P2P XML materialized views. The main difference between 
most of the existing platforms and ViP2P is that our system addresses the whole 
processing chain involved in evaluating queries, as opposed to only locating the 
interesting documents and shipping the query to those peers for evaluation. 
The latter approach may, in some cases, require numerous messages at query 
evaluation time and possibly increased response times. ViP2P, in contrast, 
considers the complete chain of query processing based on materialized views 
incrementally built in the network. This enables answering queries by contacting 
only a few peers and possibly re-using complex pre-computed results, stored in 
the views. 

2.2 XML data management based on DHTs 

The first DHT-based platform for XML content sharing was described in |18j . 
This work proposed a framework for indexing XML documents, based on the 
parent-child clement paths appearing in the document. Processing a query in- 
volves (i) extracting from the query a set of paths which could serve as lookup 
keys, (ii) obtaining via get calls the IDs of all documents matching the paths, 
(m) shipping the query to all the peers holding such documents and (iv) re- 
trieving the results at the query peer. The approach carries some imprecision 
in the case of queries featuring the descendant axis (//) or tree branches. For 
instance, the query /a[b]/c could be forwarded to documents in which the paths 
/a/b and /a/c occur, but the tree pattern /a[b]/c does not occur. A very similar 
approach to DHT-based XML indexing by parent-child paths is taken in |37j . 

The above discussion illustrates a common aspect in DHT-based content 
management platforms: imprecision in the indexing method leads to more peers 
being contacted to process a given query. A previous work on managing rela- 
tional data based on DHTs [26\ has shown that intensive messaging at query 
time may seriously limit scaling. Therefore, index precision is generally a desir- 
able feature. 

The work described in [5J [TU] considers the setting where XML documents 
are divided in fragments distributed among several peers. Each fragment is 
assigned as identifier the parent-child label path going from the document root 
to the root of the fragment, and subsequently, fragments are indexed in the 
DHT by their identifiers. The system uses a particular DHT which can handle 
prefix queries, and thus allows locating XML fragments for which a prefix of the 
path from the root to the fragment is known. Processing linear queries using 
only the child axis is simple, however, simple queries using the descendant axis, 
such as the query //a, need to be forwarded to all the network peers. 

The KadoP system [3! indexes XML documents at fine granularity. Thus, for 
any element name a, a network peer is in charge of storing the identifiers (or IDs, 
in short) of all a-labeled elements from all the documents in the network. The 
IDs reflect the position of the elements in the respective documents. Therefore, 
any tree pattern query can be answered by retrieving the list of IDs correspond- 
ing to each tree pattern node, and combining these lists via a holistic twig 
join [11]. This indexing model has very high precision, since the output of the 
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holistic twig join includes exactly the documents matching the query. However, 
the index is much more voluminous than in previous proposals [9] I10[ I18[ I37| , 
highlighting the severe limitations in terms of volume of the (key, value) pairs 
of the DHT index. Several optimizations in the index structure were introduced 
in [5J, based on which the KadoP platform was tested on hundreds of peers and 
1GB of data. 

More recently, the psiX system |32l |33| proposed an XML indexing scheme 
based on document summaries, corresponding to the backward simulation image 
of the XML documents (if a DTD is available, summaries can also be built 
based on the DTD). An algebraic signature is associated to each summary and 
to each query. When a query arrives, the algebraic query signature is used 
to look up in a holistic fashion all document signatures matching the query. 
The precision of this indexing scheme improves over KadoP (3] by a better 
treatment of wildcard (*) nodes, which KadoP ignores for the most part of 
query processing. From the matching summaries, one can identify the concrete 
corresponding documents, and then push query evaluation to the peers hosting 
the documents. The approach is implemented over the Chord DHT and shown 
to be effective by experiments on up to 11 peers in the PlanetLab network. 

The main difference between the works described in [181 1521 1551 [57] and our 
work lies in the approach taken for query processing. These works, of which 
psiX |33| can be considered the most advanced, are only concerned with locat- 
ing the documents relevant for a query. In contrast, OHO], KadoP j5J and the 
ViP2P platform presented here address the P2P XML query processing prob- 
lem as a whole. They re-distribute data in the P2P network in order to prepare 
for the evaluation of future queries. KadoP distributes a tag index over the 
peers independently of the data and the queries, which can be seen as a "one 
size fits all" approach. ViP2P allows individual peers to choose the particular 
queries of interest for them, expressed in a rich tree pattern dialect (or, equiva- 
lent^, a useful XQuery subset) and then allows exploiting the stored results of 
such queries as views for rewriting future queries. An ongoing development of 
ViP2P [12 focuses on automatically choosing the views to materialize on each 
peer in order to improve observed query processing performance. Thus, going 
beyond the problem of locating relevant documents, ViP2P aims at making the 
most out of the existing network storage and processing capacity in order to 
evaluate queries most efficiently to the peers that need them. 

Closer in spirit to our work is the cooperative XPath caching approach 
described in [35], where peers can store results of a (peer-chosen or system- 
imposed) XPath query. The definitions of these stored queries (or views) are 
indexed in the network, enabling subsequent queries to be rewritten and an- 
swered based on these views. ViP2P is more general, since (i) our view and 
query language is an XQuery dialect with many returning nodes, as opposed to 
the simple XPath subset in |25| and (ii) our approach allows to rewrite a query 
based on several views, whereas |25| can only exploit one view for one query. 

DHT-based XML indexing methods El HH1 CHI EH [S3 EZ] are complete, 
i.e., for each query, based on the index, all relevant answers can be computed 
and returned. In ViP2P and (25J, peer-chosen views replace the compulsory 
index fragments assigned by the network to each peer. Thus, it is possible that 
some queries cannot be processed due to the lack of appropriate views. Our 
focus in ViP2P is on efficiently building and exploiting pre-computed query 
results under the form of materialized views. To guarantee completeness, our 
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approach can be coupled with an efficient and compact document-level index, 
such as psiX |33| , on which to fall back when no suitable views are found for a 
given query. 

We conclude our analysis by considering the granularity or level of detail used 
to index XML, i.e., the granularity of the keys inserted in the DHT. Element 
labels (or label paths, or document summaries) have been often used. However, 
this does not allow efficiently locating documents which satisfy specific value or 
keyword search conditions, such as e.g., //item[price=$45] or //item[contains(.,'cam 
era')]. Indexing by keywords or text nodes increases index precision but also 
significantly increases the index size, since there are many more keywords in an 
XML document than distinct tags. Therefore, the approaches of [HI EHJ HE! l3"2l 
33 37J cannot be easily extended to support keyword search and preserve their 
scalability. A value summary framework is proposed in |18| to index element 
values by trading off precision for index space. KadoP [3J indexes all keywords 
just like element labels, and proposes index-level optimization techniques to 
cope with important scale-related problems. ViP2P allows keyword and value 
conditions both in the materialized views and in the queries. 

2.3 Managing XML on a DHT: platforms vs. simulations 

Developing distributed systems, and in particular a P2P platform, requires sig- 
nificant efforts. This may be a reason why many previous works in this area 
validate their techniques based on simulated peer networks, where a single com- 
puter runs an analytical model configured to simulate a given network size. Our 
INRIA team has invested significant manpower (of the order of 70 man x month 
by now) developing the KadoP and then the ViP2P platforms. Our effort has 
taught us that many architecture and engineering problems arise due to the 
mismatch between the initial DHT goals (maintaining large dynamic networks 
connected and providing minimal messaging) , and the data-intensive operations 
required by indexing, storing, and querying large volumes of XML data. We 
have addressed these problems in ViP2P by careful architecture and engineering, 
and report in this paper experiments at a scale (in peers deployed over a WAN, 
and in data size) unattained so far by any other platform. Thus, KadoP [3 
scales up to 1 GB of data over 50 computers peers, psiX [33] used 262 MBs of 
data and 11 computers, and in this paper we report on sharing up 160 GB of 
data over up to 250 computers (in all cases, the computers were distributed in 
a WAN). 

2.4 Previous publications on ViP2P 

A first version of the platform was described in an informal setting (no proceed- 
ings) in an international workshop [3U] and a national conference [SS]. These 
works used a more restricted query language than we consider here, and de- 
scribed early experiments on a platform which has been much improved since. 
Two ViP2P applications have lead to demonstrations: P2P management of RDF 
annotations on XML documents |23| and adaptive content redistribution |12| . 
The details of view-based query rewriting in ViP2P are described in a separate 
paper [27_ . They can be seen as orthogonal to the architecture and performance 
issues described here. 
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Figure 1: System overview. 



3 ViP2P platform overview 

XML data flows in ViP2P can be summarized as follows. XML documents are 
published independently and autonomously by any peer. Peers can also for- 
mulate subscriptions, or long-running queries, potentially matching documents 
published before, or after the subscriptions. The results of each subscription 
query are stored at the respective peer, and the definition of the query is in- 
dexed in the peer network. Finally, peers can ask ad-hoc queries, which are 
answered in a snapshot fashion (based on the data available in the network so 
far) by exploiting the existing subscriptions, which can be seen as materialized 
views. We detail the overall process via an example in Section |3.1| We then 
proceed to describe the ViP2P modules implementing it in Section |3.2| 



3.1 ViP2P by example 

A sample ViP2P instance over six peers is depicted in Figure [T] and we use it to 
base our presentation of the operations which can be carried in each peer. In the 
Figure, XML documents are denoted by triangles, whereas views are denoted 
by tables, hinting to the fact that they contain sets of tuples. More details on 
views and view semantics are provided in Section [5] but they are not required 
to follow the discussion here. For ease of explanation, we make the following 
naming conventions for the remainder of this paper: 

• Publisher is a peer which publishes an XML document 

• Consumer is a peer which defines a subscription and stores its results 
(or, equivalently, the respective materialized view) 

• Query peer is a peer which poses an ad-hoc query (to be evaluated over 
the complete ViP2P network). 

Clearly, a peer can play any subset of these roles simultaneously or successively. 
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Figure 2: Sample XML document d\ 



3.1.1 View publication 

A ViP2P view is a long-running subscription query that any peer can freely de- 
fine. The definition (i.e., the actual query) of each newly created view is indexed 
in the DHT network. For instance, assume peer p2 in Figure [TJpublishes the view 
i>i, defined by the XPath query / /bibliography / /book[contains(. ,' Databases')]. 
The view requires all the books items from a bibliography containing the word 
'Databases'. ViP2P indexes v\ by inserting in the DHT the following three (key, 
value) pairs: (bibliography, v%@p2), (book,v%@p2) and (' Databases' ,v%@p2). 
Here, V\@p2 encapsulates the structured query defining v±, and a pointer to 
the concrete database at peer p2 where v\ data is stored. As will be shown be- 
low, all existing and future documents that can affect vi , push the corresponding 
data to its database. 

Peers look up views in the DHT in two situations: when publishing docu- 
ments, and when issuing ad-hoc queries. We detail this below. 

3.1.2 Document publication 

When publishing a document, each peer is in charge of identifying the views 
within the whole network to which its document may contribute. For instance, 
in Figure [I] (step a) , peer p$ publishes the document d\ (depicted in Figure |2| . 
Document d\ contains data matching the view v\ as it contains the element 
names bibliography and book, as well as the word 'Databases' . Peer p% extracts 
from d\ all distinct element names and all keywords. For each such element 
name or keyword k, p% looks up in the DHT for view definitions associated to 
k, and, thus, learns about v\ (step b). In the publication example above, p% 
extracts from d\ the results matching v\, from now on, we will use the notation 
vi(d\) to designate such results. Peer p$ sends vi(di) to p2 (step c), which adds 
them to the database storing v\ data. 

A separate mechanism is needed for a view, say v Xl published after d\ but 
having results in d\ . One possibility would be for the peer publishing v x to look 
up, among all the network documents, for those that could contain terms from 
v x and require them to contribute v x results. The drawback is that this requires 
indexing all documents on all terms, which may be wasteful since a large part 
of published content may not be looked up frequently, or not at all. 

Instead, ViP2P associates to each view an interval timestamp, corresponding 
to a time interval during which the view was published. Each peer having 
published a document d must check the DHT for views published after d. To 
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Figure 3: Basic architecture of a ViP2P peer. 



that effect, each peer performs regular lookups using as key, the time interval 
which has just passed. Thus, it retrieves the definitions of all the views published 
during that interval and contributes its data if it hasn't done it already. 

3.1.3 Ad-hoc query answering 

ViP2P peers may pose ad-hoc queries, which must be evaluated immediately 
(from the previously published data). To evaluate such queries, a ViP2P peer 
looks up in the network for views which may be used to answer it. For instance, 
assume the query q = / /bibliography / /book[contains{.,' Databases')]/ / author 
is issued at peer p$ (step 1, in Figure [T|). To process q, p$ looks up the keys 
bibliography, book, 'Databases' and author in the DHT, and retrieves a set of 
view definitions (step 2), including that of v\. Observe that q can be rewritten 
as vi/ / author; therefore, p 5 can answer q just by retrieving and extracting g's 
results out of v\. A distinguishing feature of ViP2P (step 3) is its ability to 
combine several materialized views in order to rewrite a query (as we describe 
in Section [I| . A query rewriting (a logical plan based on some views) is trans- 
lated by the ViP2P query optimizer into a distributed physical plan, specifying 
which operators will be used and in which peers they will be executed. The 
ViP2P optimizer is responsible of selecting the most efficient physical plan, as 
this choice has a significant impact in the query execution time, especially in a 
distributed setting such as ours where network communication plays an impor- 
tant role. The execution of the physical plan may require the cooperation of 
various peers, and leads to results being sent at the query peer (step 4). 

3.2 ViP2P peer architecture 

We now present the main modules of ViP2P peers as well as their functionalities 
and interaction, outlined in Figure [3] The ViP2P Core box includes the main 
modules, whereas boxes located outside ViP2P Core are independent external 
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subsystems that interact with ViP2P. 

3.2.1 External Subsystems 

FreePastry DHT [17J provides the underlying DHT layer on which ViP2P is 
built. FreePastry is an open-source implementation of Pastry |34| . an efficient, 
self-organizing and fault-tolerant overlay network. Pastry provides efficient re- 
quest routing, deterministic object location, and load balancing. ViP2P nodes 
index and lookup view definitions on FreePastry's DHT during the view mate- 
rialization and query processing. 

Java RMI is used for all large data transfers. Previous work [3] has shown that 
the DHT communication primitives were not suitable for such transfers, since 
(?) the DHT get and put operations are blocking, that is, data sent via the DHT 
becomes available at the receiver only when it has been completely received and 
(ii) message queues in the DHTs overflow easily even after tuning, in which case 
the DHT peers re-send them, which further clogs the DHT communication pipes. 
Beyond the degradation of performance, such message overflows are annoying 
because a peer that is too busy trying to re-send data, may skip sending the 
regular "ping" to his neighbors to signal that it is still alive. Then, the neighbors 
suspect the peer is down, this triggers further loss of messages etc. 

For all these reasons, we have decided to split inter-peer communication in 
two categories. The DHT is used to efficiently send small messages, typically to 
index and look up view definitions. We use RMI (which we were able to fine- 
tune by writing efficient custom serialization/de-serialization methods, properly 
controlling concurrency at the send and receiver side etc.) to send larger mes- 
sages containing view tuples, when views are materialized and queried. We also 
applied specific techniques to reduce the space occupancy of transmitting tu- 
ples. Thus, a document ID (or URI) often appears many times in a view, as 
many times as there are view tuples obtained from that document. Since the 
URIs are quite large, they make up an important part of the document data. 
We use dictionary-based encoding of the document URIs, i.e., the tuple sender 
dynamically builds a dictionary of all document URIs and sends partial dic- 
tionaries with each tuple packet, to enable decoding on the receiver side. One 
could perhaps improve performance even further by coding data-intensive com- 
munications at a lower level (e.g. using plain sockets), but the improvements 
attained by our way of utilizing RMI are already very significant. 

BerkeleyDB Within each peer, view tuples are efficiently stored into a native 
store that we built using the Berkeley DB |8] library. It provides the routines 
to store, retrieve and sort entries, while guaranteeing ACID transactions when 
view data are written and read concurrently. 

The GUI facilitates the control and inspection of each peer, enabling users to 
publish views and/or pose queries. Screenshots of the ViP2P GUI, along with 
other information, can be found on the ViP2P websittQ 
We now move to describing the core modules. 

3.2.2 Document management module 

This module is responsible for looking up for views to which the peer's docu- 
ments may contribute, extracting the data from the documents and sending it 

http:/ /vip2p. saclay.inria.fr/ 
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Figure 4: Tuple-send/receive protocol use case between a tuple-sender and a 
view holder. 



to the respective consumers. 

View definition lookup When a new document is published by a peer, the 
view lookup module at this peer first, looks up in the DHT the definitions of the 
views to which the document may contribute data, and then passes these views 
definitions to the view data extraction module. 

View data extraction Given a list of view definitions, the view data extraction 
module at a publisher peer extracts from the document the tuples matching 
each view, and ships them, in a parallel fashion, to the different consumers. 
The view data extractor is capable of simultaneously matching several views on 
a given document. Thus, the corresponding tuples are extracted during a single 
traversal of the document. The extractor maintains a thread pool for setting up 
RMI communications for shipping tuples to the consumers. As our experiments 
show in Section |6.3| this parallel tuple sending significantly reduces the time 
needed to materialize the views. 

3.2.3 View management module 

This module handles view indexing and materialization. 

View indexing This module makes visible to all network peers the definitions 
of all the views declared in the ViP2P network (of course without broadcasting 
them, since most peers are typically not interested in all views). When a new 
view is defined, the indexer inserts in the DHT (key, value) pairs used to describe 
it, based on one of the indexing strategies that we will describe in Section [5. 1| 

View materialization The view materialization module receives tuples from 
remote publishers and stores them in the respective BerkeleyDB database. In a 
large scale, real-world scenario, thousands of documents might be contributing 
data to a single view. To avoid overload on its incoming data transfers, this 
module implements a back-pressure tuple-send/receive protocol which informs 
the publisher when the incoming tuple buffer is full at the consumer side. Thus, 
a publisher may have to wait until the consumer is ready to accept the tuples. 
This makes the most out of the available publisher-to-consumer bandwidth, all 
the while avoiding costly re-transmissions due to messages lost from overflowing 
queues. 

Figure [I] traces the tuple-send/receive protocol between a tuple extractor 
and a view holder. First the tuple extractor extracts the tuples and keeps them 
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in memory being ready to ship them to the view holder. After that, it sends a 
tuple-send request to the view holder. In this example, the view holder is busy 
storing tuples (possibly sent by other tuple extractors), thus it enqueues the 
request and responds to the tuple extractor with a "busy" response. When the 
view holder is ready to accept the new set of tuples, it dequeues the request and 
informs the tuple extractor (via a "ready" message). Then, the tuple extractor 
ships the new tuples to the view holder, who finally stores them in the Berkeley 
database of the respective view. The view holder can serve multiple tuple-send 
requests concurrently. Our experiments (Section 6.3) show how the concurrency 
can affect the time needed for a set of views to be materialized. 



3.2.4 Query management module 

A sequence of steps are required to evaluate queries, each performed by a dedi- 
cated module, as follows. 

View lookup This module, given a query, performs a lookup in the DHT 
network retrieving the view definitions that can be used to rewrite the query. 

Query rewriting This module takes a given ad-hoc query and a set of available 
view definitions and produces a logical rewriting plan which, evaluated on some 
views, produces exactly the results required by the query (algorithm detailed 
in [27] and illustrated in Section [4]). 

Query optimization This module receives as input a logical rewriting plan 
which is output by the query rewriting module and translates it to an optimized 
physical plan. The optimization takes place both at the logical (join reordering, 
push selections and projections etc.) and physical (dictating the exact flow of 
data during query execution, selection of the appropriate physical operators etc) 
level. 

Query execution This module provides a set of physical operators which can 
be executed by any ViP2P peer, implementing the standard iterator-based ex- 
ecution model [20 . Since ViP2P is a distributed application, operators can be 
deployed to peers and executed in a remote manner. The query optimization 
module is the one to decide the parts of a physical plan that every peer executes. 

Data exchange operators are an essential part of a distributed execution plan. 
To that end, ViP2P implements two data exchange operators: the Send and 
Receive operators that permit data exchange across peers. They are always used 
in pairs: whenever a data sender peer executes a Send operator, the data receiver 
executes its respective Receive operator. Send and Receive are implemented 
using asynchronous communication buffers (tuples are not sent through the 
network one by one but in buckets of specified size) and data is transferred 
via RMI. To reduce the transferred data volumes, document URIs (present in 
each view tuple to identify the document the tuple was extracted from) are 
compressed using a dictionary by the Send and decompressed by the Receive as 
described in Section 13.2.11 

ViP2P implements the typical Selection, Projection, Hash Join, Nested Loop 
Join and Merge Join operators. Moreover, it uses the XML specific operators 
Holistic Twig Join |11| . Structural Ancestor Join and Structural Descendant 
Join [5] performing structural joins based on the structural identifiers (IDs) of 
the incoming tuples. The Navigation operator corresponds to the logical navi- 
gation operator, described in Section |4.3[ Two sorting operators are available: 
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an in-memory sort operator Memory Sort, and an external memory sort based 
on BerkeleyDB. 
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1 


1 ■= 


for absVar (, (absVar\relVar))* 
(where pred (and pred)*)! return ret 


2 


absVar := 


Xi in doc(uri) p 


3 


relVar := 


Xi in Xj p // Xj introduced before Xi 


4 


pred := 


string(xj) = (string^) c) 


5 


ret := 


(I) elem* (//) 


6 


elem := 


(k){ {x k | \d{x k ) | string(x fe )) }(//,•) 



Figure 5: Grammar for views and queries. 

4 Views, queries and rewritings 

Once an ad-hoc query is issued by a peer, as described in Section [3j a DHT 
lookup retrieves the definitions of the existing ViP2P network views which could 



be used to answer the query (for more details, see Section 5.2). Then, the 
query peer runs its own algorithm for rewriting the query using the respective 
materialized views. The algorithm used in ViP2P is presented in [57] and its 
details are out of the scope of this paper, where we are mainly concerned with 
the platform and its scalability. However, to make this paper self-contained, we 



present the XQuery dialect we consider (Section 4.1 1, we present a joined tree 



pattern formalism that conveniently represents queries and views (Section 4.2) 



and describe our algebraic rewritings based on views (Section 4.3 1 



4.1 XQuery dialect 

Let £ be a set of XML node names, and XV be the XPath^//'!]} language |3T]. 
We consider views and queries expressed in the XQuery dialect described in Fig- 
ure [5] In the for clause, absVar corresponds to an absolute variable declaration, 
which binds a variable named Xi to a path expression p G XV to be evalu- 
ated starting from the root of some document available at the URI uri. The 
non-terminal relVar allows binding a variable named Xj to a path expression 
p G XV to be evaluated starting from the bindings of a previously-introduced 
variable Xj. The optional where clause is a conjunction over a number of predi- 
cates, each of which compares the string value of a variable Xj, either with the 
string value of another variable Xj , or with a constant c. 

The return clause builds, for each tuple of bindings of the for variables, a 
new element labeled /, having some children labeled (1,1^ G £). Within each 
such child, we allow one out of three possible information items related to the 
current binding of a variable Xk, declared in the for clause: (1) Xk denotes the 
full subtree rooted at the binding of x&; (2) string^) is the string value of the 
binding; (3) id(xfc) denotes the ID of the node to which Xk is bound. 

There are important differences between the subtree rooted at an element 
(or, equivalently, its content), its string value and its ID. The content of Xi 
includes all (element, attribute, or text) descendants of Xj, whereas the string 
value is only a concatenation of n's text descendants [39]. Therefore, string^) 
is very likely smaller than Xi's content, but it holds less information. Second, an 
XML ID does not encapsulate the content of the corresponding node. However, 
XML IDs enable joins which may stitch together tree patterns into larger ones. 
We assume structural IDs, i.e., comparing the IDs of two nodes n\ and ni 
allows determining if n\ is a parent (or ancestor) of n^. Our XQuery dialect 
distinguishes structural IDs, value and contents, and allows any subset of the 
three to be returned for any of the variables, resulting in significant flexibility. 
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for 


$p in doc("confs")//confs//SIGMOD/paper, $yl in $p/year, 

ila in tI n / / a i irhnr fpm a 1 11 t\c~\ in li^/^Triliatirin/ / mi i ntrv 

Jld III JJLJ//alJLIIUIIdllalll, J/V- ± III 4fa/dlllllclLIUII//l~VJLJIILiy r 






$b in doc("books")//book, $y2 in $b/year, $e in $b/editor, 
'fit in fth/Ztitlp fir? in $h / /rm intrv 

J/L III JJ L* / / L 1 LJ C , III -VVJ J I \**\J\A 1 1 L J V 




where 


$e='ACM' and $yl=$y2 and $cl=$c2 




return 


(■res') ('tvalUstrine('$tH<7tval'> //res') 


V\ 


for 


$p in doc("confs")//confs//paper, $a in $p/ affiliation 




return 


(vl) (pid){id($p)}(/pid) (aid){id($a)}(/aid) 
(acont){$a}(/acont) (/vl) 




for 


$b in doc("books")//book, $c in $b//country, $e in $b/editor, 

St in Jb/title, Jyl in $b/year, $p in doc("confs")//SIGMOD/paper I 


V2 




Jy2 in $p/year, $a in $p//author[email] 




where 


$e='ACM' and $yl=$y2 




return 


(v2) (cval){string($c)}(/cval) (tval){string($t)}(/tval) 
(pid){id($p)}(/pid) (aid){id($a)}(/aid> (/v2) 




for 


$vl in doc("vl.xml")//vl, $pl in $vl/pid, $afl in $vl/aid, 
$cl in $vl//acont//country, $v2 in doc("v2.xml")//v2, 


r 




Jc2 in $v2/cval, Jt2 in Jv2/tval, $p2 in $v2/pid, $a2 in $v2/aid 




where 


Jpl=$p2 and parent($a2,$afl) and $cl=$c2 




return 


(res) (tval){$v2}(/tval) (/res) 



Figure 6: XQuery query, views, and rewriting. 



For illustration, Figure [6] shows a query q in our XQuery dialect, as well 
as two views v± and Vi. The parent custom function returns true if and only if 
its inputs are node IDs, such that the first identifies the parent of the second. 
Moreover, as usual in XQuery, the variable bindings that appear in the where 
clauses imply the string values of these bindings (e.g. $e='ACM' is implicitly 
converted to string($e)='ACM'). 

4.2 Joined tree patterns 

We use a dialect of joined tree patterns to represent views and queries. For- 
mally, a tree pattern is a tree whose nodes carry labels from C and may be 
annotated with zero or more among: ID, val and cont. A pattern node may 
also be annotated with a value equality predicate of the form [=c] where c is 
some constant. The pattern edges are either simple for parent-child or dou- 
ble for ancestor-descendant relationships. A joined tree pattern is a set of tree 
patterns, connected through value joins, which are denoted by dashed edges. 
For illustration, Figure [7] depicts the (joined) tree pattern representations of the 
query and views shown in XQuery syntax in Figure [6j In short, the semantics 
of an annotated tree pattern against a database is a list of tuples storing the 
ID, val and cont from the tuples of database nodes in which the tree pattern 
embeds. The tuple order follows the order of the embedding target nodes in the 
database. The detailed semantics feature some duplicate elimination and pro- 
jection operators (from the algebra we will detail next), in order to be as close 
to the W3C's XPath 2.0 semantics as possible. The only remaining difference 
is that tree patterns return tuples, whereas standard XPath/XQuery semantics 
uses node lists. Algebraic operators for translating between the two are by now 
well understood [2H]- The semantics of a joined tree pattern is the join of the 
semantics of its component tree patterns. 

Translating from our XQuery dialect to the joined tree patterns is quite 
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Figure 7: Pattern query and views, and algebraic rewriting 



straightforward. The only part of the XQuery syntax not reflected in the joined 
tree patterns is the names of the elements created by the return clause. These 
names are not needed when rewriting queries based on views. Once a rewriting 
has been found, the query execution engine creates new elements out of the 
returned tuples of XML elements, values and/or identifiers, using the names 
specified by the original query, as explained in |36| . From now on, for readability, 
we will only use the tree pattern query representations of views and queries. 

4.3 Rewritings & algebra 

A rewriting is an XQuery query expressed in the same dialect as our views 
and queries, but formulated against XML documents corresponding to mate- 
rialized views. For instance, the rewriting XQuery expression r in Figure [6] is 
an equivalent rewriting of the query q using the views v± and V2 in the same 
Figure. 

An alternative, more convenient, way to view rewritings is under the form 
of logical algebraic plans. We will now present the logical operators that are 
used to express the view rewritings. We denote by -< the parent comparison 
operator, which takes as input arguments two IDs and returns true if the node 
corresponding to the left-hand ID argument is the parent of the node corre- 
sponding to the right-hand ID. The ancestor comparison operator, denoted -« , 
is defined in a similar way. Observe that -< and -4< are only abstract operators 
here (we do not make any assumption on how they are evaluated). 

We consider an algebra on tuple collections (such as described in the previous 
Section) whose main operators are: (1) scan of all tuples from a view v, denoted 
scan(v) (or simply v for brevity, whenever possible), (2) cartesian product, 
denoted x ; (3) selection, denoted a pre d, where pred is a conjunction of predicates 
of the form a c or a b, a and b are tuple attributes, c is some constant, and 
is a binary operator among {=,-<, ^<}; (4) projection, denoted 7r co ; s , where 
cols is the attributes list that will be projected; (5) navigation, denoted nav a ^ np , 
which is a unary algebraic operator, parameterized by one of its input columns' 
name a, and a tree pattern np. Column a must correspond to a cont attribute 
in the input of nav. Let t be a tuple in the input of nav, and np(t.a) be the 
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Figure 8: Sample input and output to a logical nav operator. 



result of evaluating the pattern np on the XML fragment stored in t.a. Then, 
nav a ,np outputs the tuples {t x np(t.a)}, for each tuple t of the input. 

Figure [^illustrates the effect of nav when applied on a sample input operator 
op. The parameters to this nav are cont 000 k, the name of the column contain- 
ing (book) elements, and the tree pattern //author. The first tuple output by 
nav is obtained by augmenting the corresponding input tuple with a cont aut hor 
attribute containing the single author-labeled child of the element found in its 
contbook attribute. The second and third nav output tuples are similarly ob- 
tained from the last tuple produced by op. Observe that the second tuple in 
op's output has been eliminated by the nav since it had no (author) element in 
its conth. 00 k attribute. 

The algebra also includes the join operator, defined as usual, sort and du- 
plicate elimination. For illustration, in the bottom of Figure [7j we depict the 
algebraic representation of the rewriting r shown in XQucry syntax at the bot- 
tom of Figure [6j 

An important feature of the rewritings we consider is minimality: our rewrit- 
ing algorithm |27j outputs only rewriting expressions in which no view instance 
can be removed and still get an equivalent rewriting for a given query. For in- 
stance, the rewriting plan in Figure|7] of the form tt(o~(vi cxi W2)), is minimal. In 
contrast, a rewriting for the same query of the form 7r(cr(wi ix 1*2 1x1 ^3)), using 
also view v$ although it is not needed, is not minimal. Considering only minimal 
rewritings allows for more efficient query execution plans: a non- minimal plan 
is always less efficient in terms of query execution time than its corresponding 
minimal one. 
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5 ViP2P view management 

Materialized views stand at the heart of data sharing in ViP2P. Sections |5.1| and 
|5.2| show how view definitions are indexed and looked up in the DHT in order 
to be retrieved for view materialization and query rewriting, respectively. 

5.1 View definition indexing & lookup for view material- 
ization 

This Section describes how published data and views "meet", i.e., how ViP2P 
ensures that for each view v, the data obtained by evaluating v over d, denoted 
v(d), is eventually computed and stored at the peer having defined v. Two cases 
arise, depending on the publication order of v and d. 

View published before the document In this case, the view definitions 
are indexed using as keys all the labels (node names and words) of the view. 



Figure 10 shows eight views. To index v±, ViP2P issues the calls put(book,vi) 
and put(title,v±) to the DHT. Observe that these calls index the definition 
of vi (not its data) on the keys book and title. Similarly, w 2 is indexed on 
the keys book, author and last, V3 using the keys paper, author and last etc. 
When the document in Figure [2] is published, get calls are issued with the keys 
bibliography, book, paper, title, author, year, Found, of Databases and all 
the other labels and keywords of the document. The result is a superset of view 
definitions of the views that the document might affect. In this case the views 
Vi to vs are retrieved. 

di vi v 2 v 3 d 2 
I — ■ — I — ■ — I — • — I — ■ — I — • — I 



U ti + 2 ti+3 ti+b 

Figure 9: Sample timeline of view and document publication. 



View published after the document ViP2P ensures that views are kept up 
to date (providing some time for the data to circulate across the network). Thus, 
when a view is published, it should be filled in with data from all the previously 
published documents matching the view. To achieve this, ViP2P associates to 
each view an interval timestamp, corresponding to a time interval during which 
the view was published, and indexes each view definition in the DHT using as 
key the corresponding timestamp. As illustrated in Figure [9] v\ belongs to (was 
published in) the interval (ij+i, ^+2], ^2 to the interval (tj+2> ti+a] an d V3 to 

(ti+3, ti+4,]- 

Each peer having published a document d must check the DHT for views 
that may have appeared after d. To that effect, each peer performs regular 
lookups using as key the time interval that has just finished. This retrieves the 
definitions of all views published during that interval. The peer then checks, 
for each of its documents, if the document has already contributed to that view 
(this information is stored locally at the peer). If this is not the case, the peer 
checks if that document holds any data for these views and if so, extracts and 
sends the corresponding data to the view holder. In Figure [9] document di 
arrives during the time interval. With the help of the timestamped 

view index, we discover the views vi, v 2 and V3 which arrived later. Notice also 
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Figure 10: Sample views and queries. 

that document c?2 is published after the views and thus is treated according to 
the first case above. 



5.2 View definition indexing &£ lookup for query rewriting 

View definitions are also indexed in order to find views that may be used to 
rewrite a given query. In this context, a given algorithm for extracting (key, 
value) pairs out of a view definition is termed a view indexing strategy. For each 
such strategy, a view lookup method is needed, in order to identify, given a query 
q, (a superset of) the views which could be used to rewrite q. Many strategies 
can be devised. We present four that we have implemented, together with the 
space complexity of the view indexing strategy, and the number of lookups 
required by the view lookup method. We also show that these strategies are 
complete, i.e., they retrieve at least all the views that could be embedded in q 
and, thus, could potentially lead to q rewritings. 



5.2.1 Label indexing (LI) 

In this strategy we index v by each v node label (either some element or attribute 
name, or word). The number of (key, value) pairs thus obtained is in 0(|u|), 
where \v\ the number of nodes of the view. 

View lookup for LI The lookup is performed by all node labels of q. The 
number of lookups is (9(|q|), where \q\ is the number of nodes in the query. 
Figure [10] depicts some sample queries. The LI lookup keys for qi are book, 
title, author and last, retrieving all the views of Figure [lOj Note that some of 
these cannot be used to equivalently rewrite q\. For instance, V3 has data about 
papers, while q\ asks for books. Similarly, LI indexing and lookup for (72 and q% 
leads to retrieving all the views. This shows that LI has many false positives. 

LI completeness If LI is not complete, then there exists a view v that can be 
used to rewrite a query q, and v is not retrieved when searching by all q labels. It 
has been shown |38_ that in order for a view to appear in an equivalent rewriting 
of a query, there must exist an embedding (homomorphism) from the view into 
the query, which entails that some node labels must appear in both. If in our 
case v and q have no common node label, this contradicts the hypothesis that 
v was useful to rewrite q. 

The LI strategy coincides with the view definition indexing for document- 
driven lookup (described previously). An interesting variant can furthermore 
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be elaborated. 



5.2.2 Return label indexing (RLI) 

Here, we index v by the labels of all v nodes which project some attributes (at 



most \v\). For instance, in Figure 10 the index keys for v\ are book and title, for 
V2 they are book and last, for V3 paper and last etc. up to vg which is indexed 
by RLI on the keys book and year. 

View lookup for RLI The view definition lookup is the same as for LI (look 



up on all query node labels). In Figure 10 the definitions of v% — V3, and U5 — vg 



will be retrieved for q\. For qi, the definitions of v\, vi, vq, V7 and vg will be 
retrieved. A RLI lookup for q$ will retrieve v\ — Vg. Observe that RLI lead to 
less view definitions retrieved than LI. 

RLI completeness Suppose that there is a view v which can be used to rewrite 
a query q, yet the definition of v is not retrieved by RLI lookup. This means 
that either (i) v does not store any attributes or (it) the labels of v nodes that 
project an attribute do not appear in q. (i) is not possible because a view that 
participates to a rewriting should store at least an attribute and (ii) is also not 
possible since it contradicts the existence of an embedding from v to q, required 
for v to be useful in rewriting q. 



5.2.3 Leaf path indexing (LPI) 

Let LP(v) be the set of all the distinct root-to-leaf label paths of v. Here, a 
path is just the sequence of labels encountered as one goes down from the root 
to the node, and does not reflect the type of the edges. We index v using each 
element of LP(v) as key. The number of (key, value) pairs thus obtained is in 



0(\LP(v)\). Going back to Figure 10 v\ is indexed on the key book.title, vi 



with the key book. author. last etc. The view Vg, composed of two tree patterns, 
is indexed using the keys book. author, paper. author and paper. year . 

View lookup for LPI Let LP(q) be the set of all the distinct root-to-leaf 
label paths of q. Let SP(q) be the set of all non-empty sub-paths of some path 
from LP(q), i.e., each path from SP(q) is obtained by erasing some labels from 
a path in LP{q). Use each element in SP(q) as lookup key. For example, qi 
of Figure [10] LPI lookup uses the keys book.title, book, title, book. author. last, 
book. author , author .last, book.last, book, author and last etc. Note that LPI 
lookup for qi does not retrieve the definitions of the views u 3 , w 4 , and v-j, which 
previous strategies retrieved, although they are not useful to rewrite q\. LPI 
can still have some false positives though: a lookup for qi retrieves U5, vq and 
vg, none of which can be used to rewrite qi (in this example, qi simply has no 
rewriting). The lookup for q^ retrieved the views v\, V5, vq, and vg. The 
filtering is very good in this case because among these only i> 5 can not be used 
to rewrite q^. 

Let h{q) be the height of q and l{q) be the number of leaves in q. The number 
of LPI lookups is bound by £ pe £p( g )2l p l < l(q) x 2 h ( q K If the query q is a join 
of tree patterns (tpqs) then the bound becomes Ti t pq^ q (T, peL p^ tpq ^ p ') . 

LPI completeness is guaranteed by the fact that if a view v can be embedded 
in the query q, then LP(v) C SP(q). 
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5.2.4 Return path indexing (RPI) 

RPI is the last strategy that we consider. Let RP(v) be the set of all rooted paths 
in v which end in a node that returns some attribute. Index v using each element 
of LP(v) as key. The number of (key, value) pairs is also in 0(\RP(v)\). The 
indexing keys for v\ are book and book.title, for «2 are book and book. author .last 
etc. 

View lookup for RPI coincides exactly with the lookup for LPI. The lookup 
of q\ retrieves the definitions of the views v\, i>2, ^5, vq and vg, the same as 
LPI. For q2, RPI lookup retrieves the definitions of v\, V2, «6 an d v$- Observe 
that unlike LPI, RPI in this situation does not return V5, which indeed is not 
useful! We end by noting that this increase of precision of RPI over LPI is not 
guaranteed. For example, an RPI lookup for q 3 retrieves the definitions of all 
views in Figure [T0| which is much less precise than LPI. 

RPI completeness is established in a similar fashion to the LPI case. 
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6 Experimental results 

In this Section we present a set of experiments studying ViP2P performance. 
Section [6 . 1| outlines the experimental setup. ViP2P attempts to speed up query 
processing by exploiting pre-computed materialized views. This shifts the com- 
plexity of extracting and sending interesting data across the network, from query 
processing to view materialization, to which we devote the most attention in our 
experiments. Several parameters determine view materialization performance: 
the distribution of the documents and views in the network, the documents 
which contribute to each view, the documents and views size etc. Section |6.2| 
starts by studying view materialization in the context of a single peer. Then, 
Section |6.3| examines view materialization in the large, in widely different net- 
work configurations, varying the number and the distribution of publisher and 
consumer peers. Section |6.4| presents an evaluation of the indexing strategies 
for query rewriting presented in Section |5.2| Finally, Section |6.5| presents ex- 
periments that evaluate the performance of the query execution engine. 

6.1 Experimentation settings 

Infrastructure setup We have carried our experiments on the Grid5000 in- 
frastructure (https://www.grid5000.fr), providing computational resources dis- 
tributed over nine major cities across France. Figure [TT] shows Grid5000 network 
topology. Sites are interconnected with a lOGbps network and within each site, 
nodes are interconnected with (at least) lGbps Ethernet network. The hard- 
ware of Grid5000 machines varies from dual-core machines (of at least 1.6 GHz 
clock speed) with 2GBs of RAM to 16-core machines with 32GBs of RAM. We 
settled for a random and heterogeneous distribution of hardware, in order to 
be close to real P2P deployment scenarios. However, in some experiments, we 
deliberately choose sites being very far away from each other, almost being the 
two opposite ends of the network, to show the scalability of our platform in the 
most difficult scenarios imagined within the Grid5000 network. 

Data generation To have fine control over all the parameters impacting our 
experiments, we have used synthetic data, produced by two existing XML data 
generators: ToXGene [7] and MemBeR |J. 

Experimentation parameters We summarize the main parameters charac- 
terizing our experiments in Table [I] For each set S, we use \S\ to denote the 
size of the set. Thus, \P\ is the number of peers in the network etc. Finally, for 
a document d, we use \d\ to denote the size of d, measured in Megabytes (MBs). 

Evaluation metrics In our measurements, we use the following metrics to 
characterize the system performance: 

• Materialization time is the time needed for the network to materialize 
a set of views populating them with the data extracted by all the docu- 
ments published in the network. The materialization time starts at the 
time instance that a peer initiates the first extraction of data and ends 
at the time that all peers have extracted and shipped the tuples to the 
appropriate view holders. 

• Tuple extraction time for a view v and a document d is the time needed 
for the publisher of d to extract from d the tuples which make up v(d). 



RR n° 7812 



2G 



Karanasos, Katsifodimos, Manolescu & Zoupanos 




Figure 11: Grid5000 network topology. 



• Storage time for a document d and a view v is the time taken by the 
consumer holding v, to add to the corresponding BerkeleyDB database 
the set of tuples corresponding to v(d). 

• Data exchange time for a document d and view v is the time needed 
for the tuples v(d) to be transferred across the network from the publisher 
of d to the consumer holding v. 

• Lookup time for a query q is the time needed for the peer asking q to 
lookup in the DHT the views that may be useful to rewrite q. 

• Embedding time for a query q and a set of views V is the time needed 
by the query peer to verify which of the views may actually be used to 
rewrite q. Recall from Section [5~2] that this is established by checking for 
the presence of embeddings between each view v *E V and the query q |38J . 

• Query response time for a query q is the time elapsed between the mo- 
ment when the query has been posed, and the moment when its execution 
has finished (as observed at the query peer). 

• Time to first result for a query q is the time between the moment when 
the query has been posed, and the moment when its first result tuple has 
been received at the query peer. 

Whenever the query, view, or document are not specified for a given metric, 
the metric value is understood to be the sum, over all the documents, views, 
and queries used in the respective experiment, of the respective metric, with the 
exception of the materialization time. By nature, this metric accounts for many 
materialization processes running in parallel, and therefore is not the sum of 
individual materialization times. For instance, assume publisher p% publishes a 
document which contributes data to a view at p2, while publisher p[ similarly 
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Symbol 


Description 


P 


The set of peers in the network 


Pd 


The set of peers holding at least one document 


V 


The set of views in the network 


Pv 


The set of peers holding at least one view 


D 


The set of all published documents 


D v 


The set of documents matching at least one view 



Table 1: Parameters characterizing the experiments. 



contributes to a view at p' 2 - The peers p\ and p' x will start at about the same time 
the materialization process by looking up views to which they could contribute 
etc. One of them will be the last to report that all its tuples have been stored 
and acknowledged by the respective consumer peer. The materialization time 
of this experiment spans between the first materialization start event, and the 
last materialization end event, while the two processes run in parallel. 

6.2 View materialization in the small 

We start by studying the performance of extracting from a document d. the 
tuples corresponding to a view v, and sending these v(d) tuples from the peer 
holding d to the one storing v. To focus exactly on the process of extraction, 
we use very simplistic network settings. View materialization in more complex 
settings and larger scale will be studied next. 

Experiment 1: sequential vs. parallel extraction of views As described 
in Section |3.2| a ViP2P peer p is capable of simultaneously matching several 
views vi,V2, ■■■,Vk on a given document d residing at p. The corresponding 
tuples vi(d), V2(d), . . ., Vk(d) are extracted during a single traversal of the 
document d, instead of k traversals (one for each of the k views). This is 
important when publishing a document d in case the publisher finds out that 
many previously defined views could match d, and therefore it has to match 
all of them against d. While parallel extraction is faster, it may require more 
memory, since matches for the various views have to be constructed and kept 
in memory at the same time. 

Our first experiment studies the effect of extracting data for several views in 
parallel. We use a document d and two distinct sets of views. First, we consider 
a four- view set of the form {//tim} for i € {1, . . . , 4}. Second, we consider a 
larger set including views of the form {/ /tj /£>} for i G {1, . . . , 8}. The views 
and d are chosen so that d contributes 130.000 tuples to each published view Vi. 
The parameters characterizing the experiment are as follows: 

1^1 \Pd\ \V\ \Pv\ \D\ \Dv\ \d\ 
2 1 {4, 8} 1 1 1 100MB 



Figure 12 (left) depicts the extraction time when extracting data out of d 
for four and for eight views, in a parallel and sequential fashion. We observe 
that parallel extraction accelerates data extraction (in this case, up to 40%). 
Therefore, we will always use parallel extraction in the subsequent experiments. 

Experiment 2: studying one data transfer pipe We now study the ma- 
terialization of documents of various sizes, in order to identify the bottleneck 



RR n° 7812 



28 



Karanasos, Katsifodimos, Manolescu & Zoupanos 



Parallel Extraction e: 
Sequential Extraction ■ 



L 



Number of patterns 



40 
35 
30 
25 
20 
15 
10 
5 




Data exchange time nxx« 
BDB storage time siassa 
Extraction time 



100 



ail d 



200 300 400 
Size of document (MB) 



500 



Figure 12: Experiment 1: parallel vs. sequential extraction time (left); experi- 
ment 2: view materialization over different-size documents (right). 



of the materialization process. Possible bottlenecks are (i) data extraction at 
the document publisher; (ii) network bandwidth between a consumer and a 
publisher; (Hi) view storage time at the consumer. For this experiment, the 
following parameters are used: 

\P\ \Pd\ \V\ \Pv\ \D\ \Dv\ \d\ 

2 11111 {100, ...,500}MB 

One peer plays the role of the publisher, while the other is the consumer. 
The peers are located at two opposite ends of France (Lille and Grenoble) . The 
document and the view are chosen so that the complete content of the document 
is extracted and sent to the consumer, thus, the materialized view size increases 
linearly to the size of the document. 

Let us now detail the synchronization of the various processes involved when 
a publisher sends data to a consumer to be added in a view. 

1. The publisher extracts data locally. After all the tuples from v(d) have 
been computed, the publisher starts sending them to the consumeiQ 

2. Packets of tuples are sent over the network to the consumer in an asyn- 
chronous way using buffers at the consumer side. 

3. At the consumer, a thread picks packets of tuples from the buffer and 
stores them in the BerkeleyDB database. 

The buffer at the consumer can be parameterized to control the data transfer 
speed: when the buffer is full because the storage thread is not sufficiently fast, 
data transfer stalls. For this experiment, the size of the data buffer was set to 
unlimited (making sure in advance that the memory of the consumer is enough 
to store all the produced tuples), so that the data exchange thread can use as 
much as possible of the available bandwidth between the two peers. 



Figure 12 (right) depicts the time needed for the view tuples to be (i) ex- 
tracted from the document, (ii) sent over the network and (Hi) stored in Berke- 
leyDB at the consumer. We observe that the three times increase linearly in the 



2 This could be improved to parallelize extraction and sending in some cases, but there 
are fundamental limitations: for some of the views we support, one needs to wait for the full 
traversal of the document before producing an output tuple [19]. 
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Figure 13: Outline of a controlled synthetic document for our experiments (left); 
experiment 3: view extraction and materialization time depending on the num- 
ber of consumers (right). 



size of the data. Data extraction is the slowest component, however, overall, 
times were comparable (also recall that the network connection is fast within 
the Grid, thus transfer times may be higher in other contexts). 

Conclusion From the above two experiments, we conclude that (i) parallelizing 
data extraction does speed up the time to compute view tuples; (it) extraction 
time grows linearly to the size of the input document and (Hi) data transfer 
and data storage time grow linearly with the size of the extracted tuples. 

6.3 View materialization in larger networks 

We now consider view materialization in larger and more complex environments, 
with many publishers and/or many consumers. 

Documents For these experiments, we needed to tightly control which parts of 
the published data are relevant to which views on each peer. Therefore, unless 
stated otherwise, we rely on documents whose shape is outlined on the left of 



Figure 13 There are always 64 camera elements under one catalog, and each 
camera has 4 children. To obtain different document sizes, we insert text of 
varying length in the description of each camera. 

Experiment 3: one publisher, fixed data, varying number of con- 
sumers In this experiment, we use a single publisher, a fixed data set (5 doc- 
uments of 50 MBs each), and a varying number of consumers (from 1 to 64). 
Each consumer always holds exactly one view. All the published data is relevant 
for some view and moreover, the view contents do not overlap, i.e., the data is 
practically "partitioned" over the views. Thus, when there is a single consumer, 
its view stores the cont of all cameras from the catalog. When there are two 
consumers, the view of the first consumer stores the cont of the cameras from 
camerai to camera32, while the other consumer's view stores the cont of the rest 
of the cameras (camera33 to camera64) and so on. This way, the views absorb 
all the data published. The producer is located in Lille and the consumers in 
Sophia- Antipolis (two opposite ends of France) . The parameters values for this 
experiment are given in the table below: 

\P\ \Pd\ \V\ \Pv\ \D\ \Dv\ \d\ 

65 1 {1,2,4,..., 32,64} {1, 2, 4, . . . , 32, 64} 5 5 50MB 

Once the tuples are extracted by a publisher, they can be shipped to the view 
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Figure 14: Experiment 4: one publisher, varying size of data, 64 consumers 
(left); experiment 5: 64 publishers, varying data size, one consumer (right). 



holders sequentially (the publisher contacts the consumers one after the other) 
or in parallel (the publisher ships all the tuples to all consumers concurrently). 
At right in Figure [13] we show the time needed to extract the tuples, and 
the materialization time for the two variations of tuple sending: sequential or 
parallel. In both cases, as expected, the extraction time is the same and it 
increases linearly with the number of consumers. 

When sending tuples sequentially, we observe that the materialization time 
increases linearly with the number of consumers (views). In the case of 64 
consumers, data extraction takes about 45 seconds, but materialization takes 
about 200 seconds. Materialization time increases drastically with sequential 
tuple sending since more and more consumers need to be contacted one after 
another. 

When sending tuples in parallel, we observe that the materialization time is 
notably lower than in the case of sequential tuple shipping and that its slope is 
almost the same as the one of the extraction time. This is because, as soon as 
the tuples are extracted, a pool of threads (one thread for each packet of tuples) 
takes over the task of shipping all the tuples in parallel. The bottleneck in this 
situation is the upload link of each consumer. 

Experiment 4: one publisher, varying data size, 64 consumers We 

study how materialization time is affected when the total size of published data 
is increased. We use one publisher. The size of the published data varies from 
64MBs to 1024MBs. 

Each of the 64 consumers holds one view of the form / /catalog/ /earner cik cont 
where K varies according to the peer that holds the view. For example, the 
first consumer holds the view // catalog // earner a\ con t; the second holds the 
view / / catalog/ / earner ai con t etc. This way, from each document the publisher 
extracts 64 tuples, each of which is sent to a different consumer. All the content 
of the documents is absorbed by the 64 views. The parameter values used for 
this experiment are: 

1^1 \Pd\ \V\ \Pv\ \D\ \Dv\ ML 

65 1 64 64 {64,512,1024} {64,512,1024} 1MB 

Like in Experiment 3, we run two variations of the same experiment: (i) 
one for sequential tuple sending and (ii) one for parallel tuple sending. The 
graph at left in Figure [14] shows, as expected, that the materialization time 
increases linearly with the size of data published in the network in both cases. 



Inria 



The ViP2P Platform: XML Views in P2P 



31 



It also shows that the materialization time in the case of parallel tuple sending 
is considerably shorter (about 3000 sec. instead of 11500 sec. for absorbing 
1024MBs of data). 

Experiment 5: 64 publishers, varying data size, one consumer We now 

study the potential for parallel publishing, i.e., the impact of the number of (si- 
multaneous) publishers on the capacity of absorbing the data into a single view. 
The published data size varies from 64MBs to 3.2GBs, and all the published 
data ends up in the view. The parameter values for this experiment arc: 

1^1 \Pd\ \V\ Wv\ \D\ \Dv\ \d\_ 

65 64 1 1 {64, . . . , 3200} {64, . . . , 3200} 1MB 



Recall from Section 13.21 that the view materialization module maintains a 
queue of tuple-send requests and allows only a certain number of concurrent 
tuple-extractors to send data to it concurrently. In this experiment we test 2 
modes of tuple-receiving concurrency: (i) the consumer accepts only one tuple- 
send request at any given time (sequential tuple receiving); (ii) the consumer 
accepts at most 64 tuple-send requests concurrently (parallel tuple receiving). 



Figure 14 (right) depicts the materialization time as the size of the published 
data increases. We observe that the materialization time increases proportion- 
ally to the size of published data in both sequential and parallel tuple receiving 
modes. Also, parallel tuple receiving reduces the view materialization time by 
more than 50% (600 sec. instead of about 1400 sec. to absorb 3.2GBs of data). 

From the two graphs in Figure [14] we conclude that it is faster for the 
network to absorb data using one consumer and many publishers rather than 
many consumers and one publisher. For example, for absorbing 1024MBs of 



data, the view materialization time is less than 200 seconds (Figure 14 right) 



for 64 publishers and one consumer, and about 3000 seconds in the case of one 



publisher and 64 consumers (Figure 14 left). This is explained by the fact that 



data extraction is proven to be a slow process (Experiment 2) thus it is slow for 
a peer to extract all the available data by itself and ship them to the consumers. 

Experiment 6: varying number of publishers, fixed data, one con- 
sumer The purpose of this experiment is to study the impact that the paral- 
lelization of document publication has on the view materialization time. We use 
250MBs of data distributed evenly across an increasing number of publishers. 
First, one peer publishes all the data, then two peers publish half of the data 
each, then 4, then 8 peers etc. The parameter values for this experiment are as 
follows: 

1^1 \Pd\ \V\ \Pv\ \D\ \Dv\ \d\ 

65 {1,2,..., 64} 1 1 512 512 0.49MB 



Figure 15 (left) shows how materialization time varies depending on the 
number of parallel publishers. The time decreases as the data is distributed 
to two and then 4 publishers, as the extraction effort is parallelized. From 8 
publishers onwards, the materialization time increases again, until it stabilizes 
from 32 to 64 publishers. This increase is due to publishers simultaneously 
trying to connect to the consumer and making the consumer's storage module 
the bottleneck. 

Experiment 7: community publishing We now consider a more complex 
scenario. We study materialization time in a setting with (logical) sub-networks, 



RR n° 7812 



32 



Karanasos, Katsifodimos, Manolescu & Zoupanos 




4 8 16 32 64 20 40 60 80 100 120 140 160 

Number of peers that data is distributed to Data size (GB) 

Figure 15: Experiment 6: publishing the same amount of data from an increas- 
ing number of publishers (left); experiment 7: publishing varying size of data in 
50 groups of 5 peers each (right). 



i.e., such that no single publisher has data of interest to all views, and no single 
view needs data from all publishers. The parameters of this experiment are: 

\P\ \Pd\ \V\ \Pv\ \D\ \Dv\ \d\_ 

250 250 50 50 {20K, . . . ,16QK} {20K, . . . , WOK} 1MB 

We use a network of 250 peers, each of which holds the same number of 1MB 
documents. We logically divide the network into 50 groups of 5 peers each, such 
that in each group there are five publishers and one consumer (one peer is both 
a publisher and a consumer). The data of all publishers in a group is of interest 
for the consumer of that group, but it is not relevant for any of the other groups' 
views. The group peers are randomly chosen, i.e., they do not enjoy any special 
geographic or network locality etc. The total amount of data published (and 
shipped to the views) varies from 20GBs to 160GBs. Figure [15] (right) shows 
that the materialization time grows linearly with the published data size. 

Conclusion This Section has studied several extreme cases of view material- 
ization (very skewed / very evenly distributed, with one or many publishers 
or consumers etc.), in order to traverse the space of possibilities. Overall, the 
experiments demonstrate the good scalability properties of ViP2P as the data 
volume increases, and that ViP2P exploits many parallelization opportunities 
when extracting, sending, receiving and storing view tuples. Table [2] summarizes 
the results by providing a global metric, the view materialization throughput, 
reflecting the quantity of data that can be published (from documents to views) 
simultaneously in the network. Table [2] demonstrates that ViP2P properly ex- 
ploits all opportunities for parallelism in the "community publishing" scenario: 
the throughput is of 238 MB/s, while the best comparable result in this area 
from KadoP is of 0.33 MB/s only [3]. 

6.4 View indexing and retrieval evaluation 

We now compare the view indexing and lookup strategies LI, RLI, LPI and RPI 
described in Section [B~2l 

Experiment 8: view indexing and retrieval We start with a random syn- 
thetic query q of height 5, having 30 nodes labeled a l5 . . . , 0,30- Each node of q 
has between and 2 children. We then create three variants of q: 
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Exp. 
No. 


Experiment description 


Throughput 

(MB/sec) 


3 


One publisher, fixed data, varying number of consumers 


10.30 


4 


One publisher, varying data size, 64 consumers 


0.34 


5 


64 publishers, varying data size, one consumer 


5.31 


6 


Varying number of publishers, fixed data, one consumer 


8.05 


7 


Community publishing 


238.80 



Table 2: Maximum data absorption throughput during view materialization. 



• q' has the same labels as q. but totally disagrees with q on the structure 
(if cii is an ancestor of aj in q. a\ is not an ancestor of aj in q') 

• q" coincides with q for half of the query, while the other half conserves the 
labels of q but totally disagrees on the structure (as in q') 

• q'" has the same structure as q, half of it has the same labels a\, . . . , ai5, 
while the other half uses a different set of labels b\ , . . . , &15 (that replace 
ai 6 , . . . ,a 30 respectively). 

From each of q, q' , q" and q'" we automatically generate 360 views of 2 to 5 
nodes, totaling 1440 views, such that: the views can all be embedded into their 
respective queries, i.e. those generated from q can be embedded in q, those 
generated from q' can be embedded in q' and so on. We, thus, obtain a mix of 
views resembling the original query q to various degrees. 

We have indexed the resulting 1440 views in a network of 250 peers, fol- 



lowing the LI, RLI, LPI and RPI strategies described in Section 5.2 We then 
performed lookups using the four different indexing strategies. The parameters 
characterizing this experiment are the following: 

\P\ \Pp\ \V\ \Pv\ \D\ \Dv\ Ml 
250 1440 250 



Figure 16 (left) depicts the number of views retrieved by each strategy, com- 
pared to the number of useful views, which can be embedded into q. We observe, 
as expected, that the path indexing-lookup strategies (LPI and RPI) are more 
precise than the label based ones (LI and RLI). Moreover, LPI is the most 
precise, since it uses as keys longer paths, describing views more precisely. 

Figure [16] (right) depicts the time spent looking up in the DHT the set of 
(possibly) useful views in order to rewrite q, as well as the time spent to check 
whether embeddings exist from those views into q. We observe that from this 
angle, the label strategies (LI and RLI) perform better than the path strategies, 
since the more numerous lookups performed by the path strategies take up too 
much time when processing queries. 



Figure 17 (left) depicts the number of view definitions that were indexed in 



the DHT by each view indexing strategy. Figure 17 (right) depicts the number 
of lookups performed by each strategy for the query we consider. As expected, 
LI inserts the largest number of DHT entries. With respect to query-driven 
lookup, LI and RLI perform 30 lookups, much less than LPI and RPI that 
perform 370 lookups each. 
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Figure 16: Experiment 
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Figure 17: Experiment 8: lookups generated for retrieving views (left); embed- 
ding vs lookup time (right). 



From this experiment, we conclude that label-based strategies are preferable, 
since the savings at query processing time are more critical than the DHT 
index size (which is very modest in all cases) or the precision of look-up, as 
the retrieved view definitions are further filtered at the query peer (after the 
embedding filtering, the rewriting is run with the same set of views no matter 
the used strategy). 

6.5 Query engine evaluation 

Experiment 9: query response time vs. query selectivity and number 
of results We now investigate the query processing performance as the data 
size increases. We use 20 peers, all of which are publishers, 2 are consumers 
and 1 is a query peer. The query peer and the 2 consumers are located in 3 
different locations of France (Bordeaux, Lille and Orsay) . The parameter values 
characterizing this experiment are the following: 

\P\ \Pd\ \V\ \Pv\ \D\ \Dy\ \d\ 

20 20 2 2 {20, ...,500} {20, ...,500} 0.5MB 

The document used in this experiment is the same as the one of Figure [13] 
(left) with a slight difference: the root element catalog has only one child, named 
camera. 

The views defined in the network are the following: 
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20 100 200 500 
Result size (tuples) 

Query response time — i — 
Time to first result — K~ 




20 100 200 500 
Result size (tuples) 

Query response time — i — 
Time to first result — x— 
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Query response time — t— 
Time to first result — 



Figure 18: Experiment 9: query execution time vs. number of result tuples for 
three queries. 



• V\ is / / catalogiD / 1 cameraiD / / descriptioniD,cont 

• i>2 is / 1 'catalogiD / 1 'earner am 1 1 ' {description m, pricem,vai, specs m,cont} 

Each view stores one tuple from each document. A v\ tuple from document d 
roughly contains all of d (since the description element is the most voluminous 
in each camera). A t> 2 tuple is quite smaller since it does not store the full 
camera descriptions. We use three queries: 

• qi asks for the description cont , specs con t and price va i of each camera. To 
evaluate qi, ViP2P joins the views v\ and v-i- Observe that q\ returns 
full XML elements, and in particular, product descriptions, which are 
voluminous in our data set. Therefore, q\ returns roughly all the published 
data (from 10MB in 20 tuples, to 250MB in 500 tuples). 

• <72 requires the descriptionio, specsm and pricem of each camera. This 
is very similar to q\ but it can be answered based on vi only. The returned 
data is much smaller since there are only IDs and no XML elements: from 
2KB in 20 tuples, to 40KB in 500 tuples. 

• qj, returns the specs // sensor _type va i of each camera. The rewriting of 
<73 applies navigation over specs con t that is stored by V2- The result size 
varies from 2KB in 20 tuples to 40KB in 500 tuples. 

Figure [T8| shows the query response time and the time to get the first result 



for the 3 queries. The low selectivity query qi (at left in Figure 18 1 takes longer 
than (72, due to the larger data transfers and the necessary view join. The time 
to first result is always constant for both q\ and qi and does not depend on 
the result size. For qi , a hash join is used to combine vi and u 2 , and thus no 
tuple is output before the view V2 has been built into the buckets of the hash 
join. This is done in more or less one second in the case of qi and about 300ms 
for q2- Note that the join is performed on the peer holding v\ as it is faster to 
transfer V2 at the peer holding v\. Increases in the total running time appear 
when more data-sending messages are needed to transfer increasing amounts of 
results. For (73, which applies navigation on the view V2, the time to the first 
tuple is the time to evaluate the navigation query locally at 1*2 's peer and send 
the first message with result tuples to the query peer, and this does not grow 
with the data size. 
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Conclusion The ViP2P query processing engine scales close to linearly when 
answering queries in a wide-area network. The fact that ViP2P rewrites queries 
into logical plans which are then passed to an optimizer enables it to take 
advantage of known optimization techniques used in XML and/or distributed 
databases, to reduce the total query evaluation time, and (depending on the 
characteristics of the particular physical operators chosen) the time to the first 
answer. Given the ViP2P architecture, the peers involved in processing a query 
are only those holding the views used in the query rewriting; this is why using 
only 20 peers for this experiment does not affect its interpretation, since ViP2P 
query processing involves only three peers. The network size may only impact 



the view look-up time, which is very modest (Section 6.4 1 



6.6 Conclusion of the experiments 

Our study leads to the conclusion that the ViP2P architecture scales up well. 
In particular, view materialization scales in the number of publishers and con- 
sumers, in the size of the network, and in the size of the data. High contention 
at a single consumer receiving data from many publishers, and especially at a 
single publisher contributing to many consumers' views, degrades the ability 
of the view holders to efficiently absorb data. However, these contention ef- 
fects are to be expected in a large distributed system. Moreover, we showed 
that when interest in the published data is more evenly distributed among sub- 
communities, ViP2P takes advantage of all parallelization opportunities to in- 
crease the data transfer rate between publishers and consumers by 3 orders of 
magnitude. Our view materialization experiments also show the importance 
of carefully tuning all stages in the data extraction and data transfer process, 
including asynchronous communication and parallelization whenever possible. 
The cumulated impact of these optimizations on the data transfer rate between 
peers are dramatic (more than 4 orders of magnitude increase) . 

Our query processing experiments show that label-based view indexing strate- 
gies are preferable, and indeed we use RLI by default. They also demonstrate 
that the ViP2P execution engine scales linearly up to large data volumes, orders 
of magnitude more than in previous real DHT deployments [51 133|. 
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7 Conclusion and perspectives 

The efficient management of large XML corpora in structured P2P networks re- 
quires the ability to deploy data access support structures, which can be tuned 
to closely fit application needs. We have presented the V1P2P approach for 
building and maintaining structured materialized views, and processing peer 
queries based on the existing views in the DHT network. Using DHT-indexed 
views adds to query processing the (modest) cost of locating relevant views and 
rewriting the query using the views, in exchange for the benefits of using pre- 
computed results stored in views. We studied several view indexing strategies 
and associated complete view lookup methods. Moreover, we did an extensive 
study of our platform's main aspects (view materialization, indexing and re- 
trieval, and query processing) in different scenarios and settings. ViP2P was 
able to extract and disseminate 160GB of data in less than 15 minutes over 
250 computers in a WAN network [T|. These results largely improve over the 
closest competing XML management platforms based on DHTs, and actually 
implemented and deployed (1 GB of data indexed in 50 minutes in KadoP [3 , 
hundreds of MB of data on 11 peers in psiX [33 , which, unlike us, focused only 
on document indexing and look-up). 

Many avenues for further research are open. An ongoing work built on 
ViP2P, LiquidXML [12J automatically selects and continuously adapts a set of 
materialized views on each peer, to improve query processing performance in the 
network. Handling documents that contain references to each other and evalu- 
ating tree pattern queries that extend to many documents are other interesting 
developments. 
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